Artifcial Intelligence and Speech Recognition




Faculty Mentor:
Ms. Yogita

Student Name:
Ankita Rao (MCA-1 st Year)
Tanya Tyagi (MCA-1 st Year)



image

Figure 1: Speech Recognition

The field of artificial intelligence has been moving extremely quickly in the last few years. Are you aware how AI will affect your life in the succeeding years. This research basically aims at intelligence to augment the abilities of people to enable us accomplish more, to eliminate tedious repetitive tasks and to allow us to spend more time on our creative endeavors. AI is even more impactful than the invention of the personal computer and the spread of mobile phones into your pocket. The idea of artificial intelligence is not new. It's been around since the very earliest days of computing.
image

Figure 2: Turn to AI Voice recognition

The first Industrial Revolution of the late eighteenth century now, in less than 250 years, we catapulted from horse- drawn carts to self-driving cars, from navigating by the Stars to relying on voice-activated GPS instructions and from penning letters to loved ones to having awkward conversations with Siri. This world has seen four major revolutions that modified its entire face. And the fourth one is the revolution of artificial intelligence which we are experiencing right now. It may be a matter of interest for you to know that many years back, handful of scientists thought of creating an artificial brain.

2. APPROACH TO SPEECH RECOGNITION SYSTEM

Early forms of voice recognition had very limited vocabularies. Some of the first systems from 1950s could only recognize about 10 words and even about 30 years later that number had grown to only around 20,000 which may seem like a lot. As we know English language has over 1 million words. On top of that early software couldn't predict what words you were trying to say by using context, so to these programs it was just as likely that you were trying to say Hello Banana as Hello Nana. One reason behind that can be some people cant write their queries properly but can explain that verbally. We wonder that why millennial companies like amazon did not introduce this years ago. It is so natural and simple to use. This software even learns from real search engine strings and can also acknowledge a variety of accents so you can use it whether you're from Eastern Canada or southern Texas. But can the power of the cloud do more than you just ask? Your speech may be imperfect but speech recognition has reached an acceptable level of accuracy for most people now, with all major platforms reporting an error rate of 5%. Perhaps, for major tech companies, consumers are increasingly comfort. Isn't it fascinating that it not only listens to other words but uses probabilities to determine what you're trying to say. This is a pretty involved process that uses complicated physical models. It means greater amounts of processing power have enabled everything from real-time translation to being able to talk to gain characters with a VR headset to emotion tracking in which a personal computer can use the moment and pitch of your voice to figure out how you're feeling. We're even seeing it installed in fighter aircraft so pilots can concentrate on mission objectives instead of piffling with cockpit switches. But although voice recognition has come a really long way its expansion has presented us with some new challenges. One big concern has been finding ways to filter out background noise so you'll still get correct results, even if you're waiting in the middle of a busy street. And talking of early forms of voice recognition had very limited vocabularies. Some of the first systems from 1950s could only recognize about 10 words and even about 30 years later that number had grown to only around 20,000 which may seem like a lot. As we know English language has over 1 million words. On top of that early software couldn't predict what words you were trying to say by using context, so to these programs it was just as likely that you were trying to say Hello Banana as Hello Nana. Now a days, we all are familiar with the technology which we are having in our hand and also with the technology on which they are working. We use google assistant to set a morning alarm, Alexa to play a song for us, Cortana to tell us interesting facts, Siri to make a phone call and many more. Speech recognition technology enabled us performing various tasks whether it is ordering pizza or checking updates around the world. Slowly and steadily, but this technology is changing our life and the way to manage our work. Voice-based recognition systems determine someone by supporting their spoken words.

image

Figure 3: First Mobile Device with voice recognition

The generation of human voice involves a mixture of etiquette and physiological options. The physiological part of voice generation depends on the form and size of vocal tracts, lips, nasal cavities, and mouth. The movement of lips, jaws, tongue, velum, and voice box represent the etiquette part of voice which may vary over time because of person's age and medical condition (e.g., common cold). The spectral content of the voice is analyzed to extract its intensity, duration, quality, and pitch data, that is employed to make a model (typically the Hidden Markov Model) for speaker recognition. Speaker recognition is highly suitable for applications like tele-banking but it is quite sensitive to background noise and playback spoofing. Again, voice biometric is primarily employed in verification mode. Standing in public, another massive issue is privacy. Many types of voice recognition software improve upon themselves by learning user habits and combined with cloud processing, we've already seen some real concerns such as with Samsung Smart TVs. Earlier this year it had a privacy policy which some people believed, that allowed Samsung to monitor your living room conversations. Standing in public, another massive issue is privacy. Many types of voice recognition software improve upon themselves by learning user habits and combined with cloud processing, we've already seen some real concerns such as with Samsung Smart TVs. Earlier this year it had a privacy policy which some people believed, that allowed Samsung to monitor your living room conversations.

3. BEHIND THE MIC


image

It involves processing of texts, feature extraction and speech generation in order to convert your text to speech and vice versa. To convert a verbal speech into on-screen text, computer must go through various complex procedures. It is not as easy as we think. When we speak, we create vibrations in the air, the analog-to-digital converter converts the analog data into digital data that a computer can easily understand. To achieve the digital data, examine the sound by taking particular amount of frequency of a wave at frequent intervals of time. Isn't it amazing, seeing this fusion of many technologies. Not only studying these, but implementing them in real life. See, we have come this far. It is a remarkable progress of us humans. According to Stephen Hawking, he said AI COULD SPELL END OF THE HUMAN RACE. Wait, What? Yes. You heard it right. Different types of artificial intelligence has evolved. Under Artificial Super Intelligence, robots have capabilities to surpass human limits. They will be able to think and make decisions just like humans and they have ability to grow themselves. Man stands nowhere in front of SINGULARITY. And of course it is life threatening to man existence.

4. APPLICATION IN REAL LIFE


  • Cortana
  • Siri
  • 'Ok Google'
  • Alexa
  • Self-Driving Cars: use sensors to detect civilians and hence prevent accidents.
  • SaReGaMa Caravan
  • Smart TV
  • Dragon Medical Practice Edition
  • 5. REFERENCES


  • [1] Ted Talks
  • [2] engineering.stanford.edu
  • [3] electronics.howstuffworks.com
  • [4] www.techopedia.com
  • [5] www.informit.com
  • [6] becominghuman.ai
  • [7] emerj.com
  • [8] www.sciencedirect.com
  • [9] link.springer.com
  • [10] cortal.ccsu.edu